Massachusetts General Hospital

Collaborative effort to investigate the plasma proteomic signatures of COVID-19 positive patients

Data provided by the MGH Emergency Department COVID-19 Cohort (Filbin, Goldberg, Hacohen) with Olink Proteomics. https://www.olink.com/mgh-covid-study/

Initial analysis

Plasma proteomics reveals tissue-specific cell death and mediators of cell-cell interactions in severe COVID-19 patients

https://www.biorxiv.org/content/10.1101/2020.11.02.365536v2

By analyzing several thousand plasma proteins in 306 COVID-19 patients and 78 symptomatic controls over serial timepoints using two complementary approaches, we uncover COVID-19 host immune and non-immune proteins not previously linked to this disease. Integration of plasma proteomics with nine published scRNAseq datasets shows that SARS-CoV-2 infection upregulates monocyte/macrophage, plasmablast, and T cell effector proteins. By comparing patients who died to severely ill patients who survived, we identify dynamic immunomodulatory and tissue-associated proteins associated with survival, providing insights into which host responses are beneficial and which are detrimental to survival.

Study design

Proposed Analyses

  1. We reproduce the results obtained in Filbin et al 2020 using the Olink data (timepoint D0)

  2. We compare COVID-19 positive patients to identify biomarkers of severity (timepoint D0)

Note: To define the different severity groups we make use of the WHO scores assigned to the patients.

[1]:
import os

from ckg.analytics_core.analytics import analytics
from ckg.analytics_core.viz import viz

from ckg.graphdb_connector import connector
from ckg.graphdb_builder import builder_utils

from plotly.offline import init_notebook_mode, iplot
%matplotlib inline
init_notebook_mode(connected=True)
c:\users\sande\.conda\envs\pip_rev\lib\site-packages\outdated\utils.py:18: OutdatedPackageWarning: The package pingouin is out of date. Your version is 0.3.11, the latest is 0.3.12.
Set the environment variable OUTDATED_IGNORE=1 to disable these warnings.
  **kwargs
WGCNA functions will not work. Module Rpy2 not installed.
R functions will not work. Module Rpy2 not installed.

Reading proteomics and clinical data

[2]:
#### Access: Data provided by the MGH Emergency Department COVID-19 Cohort
#### (Filbin, Goldberg, Hacohen) with Olink Proteomics. https://www.olink.com/mgh-covid-study/

olink_file = '../../assets/MGH_COVID_OLINK_NPX.txt'
olink_clin_file = '../../assets/MGH_COVID_Clinical_Info.txt'

Proteomics data

[3]:
olink_data = builder_utils.readDataFromCSV(olink_file, sep=';', comment='#')
[4]:
olink_data.head()
[4]:
SampleID subject_id Timepoint Index OlinkID UniProt Assay MissingFreq Panel Panel_Version PlateID QC_Warning LOD NPX
0 1_D0 1.0 D0 56 OID21311 Q9BTE6 AARSD1 0.04 ONCOLOGY 1 20200772_Plate5_NEURO_ONC Pass 0.7204 3.2277
1 1_D0 1.0 D0 56 OID20921 Q96IU4 ABHD14B 0.06 NEUROLOGY 1 20200772_Plate5_NEURO_ONC Pass 0.5696 0.7205
2 1_D0 1.0 D0 56 OID21280 P00519 ABL1 0.04 ONCOLOGY 1 20200772_Plate5_NEURO_ONC Pass 0.5313 2.6293
3 1_D0 1.0 D0 56 OID21269 P09110 ACAA1 0.12 ONCOLOGY 1 20200772_Plate5_NEURO_ONC Pass 2.0588 3.2670
4 1_D0 1.0 D0 56 OID20159 P16112 ACAN 0.04 CARDIOMETABOLIC 1 20200772_Plate5_CARDIO_INF Pass 1.1623 2.0308

QC and data exploration

[5]:
print("Number of proteins:", len(olink_data['UniProt'].unique()))
Number of proteins: 1420
[6]:
olink_data.shape
[6]:
(1148916, 14)
Remove proteins with QC warnings
[7]:
olink_data['QC_Warning'].unique()
[7]:
array(['Pass', 'Warning'], dtype=object)
[8]:
olink_data.groupby('QC_Warning').count()['SampleID']
[8]:
QC_Warning
Pass       1136634
Warning      12282
Name: SampleID, dtype: int64
[9]:
olink_data = olink_data[olink_data['QC_Warning'] == 'Pass']
[10]:
olink_data.shape
[10]:
(1136634, 14)
[11]:
print("Number of proteins that passed Olink QC:", len(olink_data['UniProt'].unique()))
Number of proteins that passed Olink QC: 1420
[12]:
print("Total number of patients:", len(olink_data['subject_id'].unique()))
Total number of patients: 384
[13]:
print("Total number of samples:", len(olink_data['SampleID'].unique()))
Total number of samples: 786
[14]:
olink_data.shape
[14]:
(1136634, 14)
Remove rows with missing subject id
[15]:
olink_data = olink_data.dropna(subset=['subject_id'])
[16]:
olink_data['subject_id'] = olink_data['subject_id'].astype('int').astype('str')
[17]:
olink_data.shape
[17]:
(1108054, 14)
Merge protein names and identifiers
[18]:
olink_data['identifier'] = olink_data['Assay'] +"~"+olink_data['UniProt']
[19]:
olink_data.head()
[19]:
SampleID subject_id Timepoint Index OlinkID UniProt Assay MissingFreq Panel Panel_Version PlateID QC_Warning LOD NPX identifier
0 1_D0 1 D0 56 OID21311 Q9BTE6 AARSD1 0.04 ONCOLOGY 1 20200772_Plate5_NEURO_ONC Pass 0.7204 3.2277 AARSD1~Q9BTE6
1 1_D0 1 D0 56 OID20921 Q96IU4 ABHD14B 0.06 NEUROLOGY 1 20200772_Plate5_NEURO_ONC Pass 0.5696 0.7205 ABHD14B~Q96IU4
2 1_D0 1 D0 56 OID21280 P00519 ABL1 0.04 ONCOLOGY 1 20200772_Plate5_NEURO_ONC Pass 0.5313 2.6293 ABL1~P00519
3 1_D0 1 D0 56 OID21269 P09110 ACAA1 0.12 ONCOLOGY 1 20200772_Plate5_NEURO_ONC Pass 2.0588 3.2670 ACAA1~P09110
4 1_D0 1 D0 56 OID20159 P16112 ACAN 0.04 CARDIOMETABOLIC 1 20200772_Plate5_CARDIO_INF Pass 1.1623 2.0308 ACAN~P16112

Clinical data

[20]:
olink_clin_data = builder_utils.readDataFromCSV(olink_clin_file, sep=';', comment='#')
[21]:
olink_clin_data.head()
[21]:
subject_id COVID Age cat BMI cat HEART LUNG KIDNEY DIABETES HTN IMMUNO ... crp_3_cat ddimer_3_cat ldh_3_cat abs_neut_7_cat abs_lymph_7_cat abs_mono_7_cat creat_7_cat crp_7_cat ddimer_7_cat ldh_7_cat
0 1 1 1 4 0 0 0 0 0 0 ... 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN
1 2 1 2 2 0 0 0 0 0 0 ... 2.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 3 1 3 4 0 1 0 0 0 0 ... 3.0 2.0 3.0 NaN NaN NaN NaN NaN NaN NaN
3 4 1 1 2 0 0 0 0 0 0 ... 2.0 2.0 3.0 NaN NaN NaN NaN NaN NaN NaN
4 5 1 3 3 0 0 0 1 1 0 ... 5.0 3.0 NaN 3.0 5.0 3.0 1.0 4.0 4.0 3.0

5 rows × 40 columns

[22]:
plot = viz.get_boxplot_grid(olink_clin_data, identifier="clin_vars", args={"title": "Clinical variables",
                                                                           "x":"COVID",
                                                                           "y":"Age cat",
                                                                           "color":"COVID", "width":600})
iplot(plot.figure)
[23]:
olink_clin_data['subject_id'] = olink_clin_data['subject_id'].astype('str')
[24]:
olink_clin_data.columns.tolist()
[24]:
['subject_id',
 'COVID',
 'Age cat',
 'BMI cat',
 'HEART',
 'LUNG',
 'KIDNEY',
 'DIABETES',
 'HTN',
 'IMMUNO',
 'Resp_Symp',
 'Fever_Sympt',
 'GI_Symp',
 'WHO 0',
 'WHO 3',
 'WHO 7',
 'WHO 28',
 'WHO max',
 'abs_neut_0_cat',
 'abs_lymph_0_cat',
 'abs_mono_0_cat',
 'creat_0_cat',
 'crp_0_cat',
 'ddimer_0_cat',
 'ldh_0_cat',
 'Trop_72h',
 'abs_neut_3_cat',
 'abs_lymph_3_cat',
 'abs_mono_3_cat',
 'creat_3_cat',
 'crp_3_cat',
 'ddimer_3_cat',
 'ldh_3_cat',
 'abs_neut_7_cat',
 'abs_lymph_7_cat',
 'abs_mono_7_cat',
 'creat_7_cat',
 'crp_7_cat',
 'ddimer_7_cat',
 'ldh_7_cat']

Merging both datasets

[25]:
data = olink_data.set_index('subject_id').join(olink_clin_data.set_index('subject_id')).reset_index()
[26]:
data.head()

[26]:
subject_id SampleID Timepoint Index OlinkID UniProt Assay MissingFreq Panel Panel_Version ... crp_3_cat ddimer_3_cat ldh_3_cat abs_neut_7_cat abs_lymph_7_cat abs_mono_7_cat creat_7_cat crp_7_cat ddimer_7_cat ldh_7_cat
0 1 1_D0 D0 56 OID21311 Q9BTE6 AARSD1 0.04 ONCOLOGY 1 ... 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN
1 1 1_D0 D0 56 OID20921 Q96IU4 ABHD14B 0.06 NEUROLOGY 1 ... 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN
2 1 1_D0 D0 56 OID21280 P00519 ABL1 0.04 ONCOLOGY 1 ... 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN
3 1 1_D0 D0 56 OID21269 P09110 ACAA1 0.12 ONCOLOGY 1 ... 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN
4 1 1_D0 D0 56 OID20159 P16112 ACAN 0.04 CARDIOMETABOLIC 1 ... 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN

5 rows × 54 columns

[27]:
data.shape
[27]:
(1108054, 54)

1. Covid-19 positive vs negative

[28]:
data = data[data['Timepoint'] == "D0"]
[29]:
data.shape
[29]:
(542015, 54)
[30]:
print("Total number of positive patients:", len(data['subject_id'].unique()))
Total number of positive patients: 383
[31]:
df = data[['subject_id', 'SampleID', 'identifier', 'WHO max', 'WHO 0', 'NPX', 'Age cat', 'COVID', 'BMI cat', 'Timepoint', 'HEART']]
[32]:
print("Total number of positive patients:", len(data['subject_id'].unique()))
Total number of positive patients: 383
[33]:
df_wide = analytics.transform_into_wide_format(df, index=['SampleID', 'subject_id'], columns=['identifier'], values='NPX', extra=['WHO max', 'WHO 0', 'Age cat', 'COVID', 'BMI cat','HEART'])
[34]:
df_wide.head()
[34]:
SampleID subject_id AARSD1~Q9BTE6 ABHD14B~Q96IU4 ABL1~P00519 ACAA1~P09110 ACAN~P16112 ACE2~Q9BYF1 ACOX1~Q15067 ACP5~P13686 ... YES1~P07947 YTHDF3~Q7Z739 ZBTB16~Q05516 ZBTB17~Q13105 WHO max WHO 0 Age cat COVID BMI cat HEART
0 100_D0 100 3.2374 1.9080 3.4624 3.7201 2.8415 1.4208 0.3747 4.6433 ... 5.0236 0.4053 2.9979 1.2529 4 4 1 1 4 0
1 101_D0 101 2.1038 1.1206 2.0700 2.0108 2.6067 0.4767 -0.0337 4.6615 ... 3.3785 0.2607 1.9315 1.1605 4 4 2 1 1 0
2 102_D0 102 2.7613 1.3493 2.8657 2.8731 2.0461 0.3986 0.2983 4.4803 ... 4.6267 0.4771 1.7620 0.6840 1 1 5 1 3 0
3 103_D0 103 2.6384 0.9447 1.4727 2.6550 1.6613 0.6749 0.1486 3.7081 ... 2.9520 0.4808 0.9480 0.4823 6 6 1 1 5 0
4 104_D0 104 5.3336 1.5130 2.0466 1.8627 2.4433 1.5736 -0.1446 3.2100 ... 2.3919 0.0496 1.3069 1.1057 6 6 5 0 3 1

5 rows × 1428 columns

[35]:
df_wide.shape
[35]:
(383, 1428)
[36]:
df_wide.describe()
[36]:
AARSD1~Q9BTE6 ABHD14B~Q96IU4 ABL1~P00519 ACAA1~P09110 ACAN~P16112 ACE2~Q9BYF1 ACOX1~Q15067 ACP5~P13686 ACP6~Q9NPH0 ACTA2~P62736 ... YES1~P07947 YTHDF3~Q7Z739 ZBTB16~Q05516 ZBTB17~Q13105 WHO max WHO 0 Age cat COVID BMI cat HEART
count 380.000000 380.000000 380.000000 380.000000 378.000000 374.000000 378.000000 383.000000 383.000000 374.000000 ... 380.000000 378.000000 380.000000 374.000000 383.000000 383.000000 383.000000 383.000000 383.000000 383.000000
mean 3.532293 1.842658 2.760268 3.463129 2.477235 1.468916 0.137929 4.213379 3.877484 0.186726 ... 3.582903 0.771492 1.751017 1.423486 3.498695 3.832898 3.219321 0.796345 2.480418 0.185379
std 1.400306 0.953734 0.991535 1.378123 0.566340 0.933315 0.371251 0.664486 0.799831 0.902741 ... 1.538310 0.839493 0.975425 0.676457 1.510651 1.254631 1.208150 0.403242 1.152738 0.389113
min 0.395300 -0.416000 0.722100 0.541900 0.943900 -0.304900 -1.231400 2.751800 -0.470600 -1.322600 ... 0.883800 -1.478300 0.176700 -0.153700 1.000000 1.000000 1.000000 0.000000 0.000000 0.000000
25% 2.523550 1.208275 2.030875 2.616600 2.053875 0.855425 -0.075375 3.718600 3.374700 -0.405800 ... 2.253875 0.206750 1.057350 0.978850 2.000000 2.000000 2.000000 1.000000 2.000000 0.000000
50% 3.227700 1.663900 2.676800 3.310600 2.500100 1.253250 0.145550 4.222300 3.888400 -0.038550 ... 3.395650 0.719950 1.520150 1.330750 4.000000 4.000000 3.000000 1.000000 2.000000 0.000000
75% 4.082625 2.237375 3.296025 4.087600 2.840300 1.855025 0.374600 4.653950 4.314900 0.432250 ... 4.673750 1.309325 2.259025 1.771600 4.000000 5.000000 4.000000 1.000000 3.000000 0.000000
max 8.334200 5.339900 6.507100 9.427600 4.161700 6.196200 1.760600 6.656400 7.554500 4.327900 ... 8.330600 4.298200 5.154800 4.485200 6.000000 6.000000 5.000000 1.000000 5.000000 1.000000

8 rows × 1426 columns

[ ]:

[37]:
result = analytics.run_correlation(df_wide, alpha=0.05, subject='subject_id', group="COVID", method="spearman", correction='fdr_bh')
[38]:
net = viz.get_network(result, identifier='corr_net', args={'source':'node1', 'target':'node2',
                                                     'weight':'weight', 'values':'weight',
                                                           'title':'Correlation Network',
                                                        'color_weight': True, 'node_size':'degree',
                                                           'cutoff': 0.5, 'cutoff_abs':True})
[39]:
viz.visualize_notebook_network(net['notebook'])
[40]:
df_wide.to_csv('olink_data_NPX_values_WHO_max.tsv', sep='\t', index=False, header=True, doublequote=False)
  • Imputation using mixed model: KNN when 60% valid values or Probabilistic Minimum Imputation otherwise

  • Imputation per group based on WHO 0

[41]:
df_wide = analytics.imputation_mixed_norm_KNN(df_wide, index_cols=['WHO 0', 'WHO max', 'SampleID', 'subject_id', 'Age cat', 'COVID', 'BMI cat', 'HEART'], shift=1.8, nstd=0.3, group='COVID', cutoff=0.6)
[42]:
df_wide.head()
[42]:
AARSD1~Q9BTE6 ABHD14B~Q96IU4 ABL1~P00519 ACAA1~P09110 ACAN~P16112 ACE2~Q9BYF1 ACOX1~Q15067 ACP5~P13686 ACP6~Q9NPH0 ACTA2~P62736 ... WISP2~O76076 WNT9A~O14904 WWP2~O00308 XCL1~P47992 XG~P55808 XPNPEP2~O43895 YES1~P07947 YTHDF3~Q7Z739 ZBTB16~Q05516 ZBTB17~Q13105
WHO 0 WHO max SampleID subject_id Age cat COVID BMI cat HEART
4 4 100_D0 100 1 1 4 0 3.2374 1.9080 3.4624 3.7201 2.8415 1.4208 0.3747 4.6433 4.0392 -0.4904 ... 4.4080 0.1052 5.0765 2.5597 1.7649 5.1321 5.0236 0.4053 2.9979 1.2529
101_D0 101 2 1 1 0 2.1038 1.1206 2.0700 2.0108 2.6067 0.4767 -0.0337 4.6615 3.4319 -0.3363 ... 3.8144 0.4324 3.8148 2.0496 1.4435 4.4233 3.3785 0.2607 1.9315 1.1605
1 1 102_D0 102 5 1 3 0 2.7613 1.3493 2.8657 2.8731 2.0461 0.3986 0.2983 4.4803 3.1530 -0.3694 ... 4.6518 0.4650 4.7405 1.3652 1.6223 4.4243 4.6267 0.4771 1.7620 0.6840
6 6 103_D0 103 1 1 5 0 2.6384 0.9447 1.4727 2.6550 1.6613 0.6749 0.1486 3.7081 3.6198 -0.9642 ... 4.2031 0.2012 3.9818 1.7696 1.7539 2.2632 2.9520 0.4808 0.9480 0.4823
104_D0 104 5 0 3 1 5.3336 1.5130 2.0466 1.8627 2.4433 1.5736 -0.1446 3.2100 3.8780 1.3395 ... 5.3859 1.1573 4.0152 2.4125 2.0343 3.7573 2.3919 0.0496 1.3069 1.1057

5 rows × 1420 columns

[43]:
df_wide.to_csv('olink_data_NPX_values_imputed_KNN.tsv', sep='\t', index=False, header=True, doublequote=False)
[44]:
df_wide = df_wide.reset_index()
[45]:
df_wide['COVID'] = ['COVID-19 positive' if c else 'COVID-19 negative' for c in df_wide['COVID']]
[46]:
pca, args = analytics.run_pca(df_wide, drop_cols=['SampleID', 'subject_id', 'Age cat', 'WHO max', 'BMI cat', 'HEART'], group='COVID', annotation_cols=['SampleID'])
args['group'] = 'group'
args['hovering_cols'] = ['SampleID']
args['factor'] = 250
args['loadings'] = 15
args['title'] = 'Olink data'
[47]:
figure = viz.get_pca_plot(pca, identifier='pca', args=args)
iplot(figure.figure)
[48]:
pca, args = analytics.run_pca(df_wide, drop_cols=['SampleID', 'subject_id', 'Age cat', 'COVID', 'BMI cat', 'HEART', 'WHO max'], group='WHO 0', annotation_cols=['SampleID'])
args['group'] = 'group'
args['hovering_cols'] = ['SampleID']
args['factor'] = 250
args['loadings'] = 15
args['title'] = 'Olink data coloured by WHO score at timepoint D0'
[49]:
figure = viz.get_pca_plot(pca, identifier='pca', args=args)
iplot(figure.figure)
[50]:
pca, args = analytics.run_pca(df_wide, drop_cols=['SampleID', 'subject_id', 'Age cat', 'COVID', 'BMI cat', 'HEART', 'WHO 0'], group='WHO max', annotation_cols=['SampleID'])
args['group'] = 'group'
args['hovering_cols'] = ['SampleID']
args['factor'] = 250
args['loadings'] = 15
args['title'] = 'Olink data coloured by maximum WHO score'
[51]:
figure = viz.get_pca_plot(pca, identifier='pca', args=args)
iplot(figure.figure)
[52]:
pca, args = analytics.run_pca(df_wide, drop_cols=['SampleID', 'subject_id', 'WHO max', 'COVID', 'BMI cat', 'HEART'], group='Age cat', annotation_cols=['SampleID'])
args['group'] = 'group'
args['hovering_cols'] = ['SampleID']
args['factor'] = 250
args['loadings'] = 15
args['title'] = 'Olink data coloured by age category'
[53]:
figure = viz.get_pca_plot(pca, identifier='pca', args=args)
iplot(figure.figure)
[54]:
pca, args = analytics.run_pca(df_wide, drop_cols=['SampleID', 'subject_id', 'WHO max', 'COVID', 'BMI cat'], group='HEART', annotation_cols=['SampleID'])
args['group'] = 'group'
args['hovering_cols'] = ['SampleID']
args['factor'] = 350
args['loadings'] = 20
args['title'] = 'Olink data coloured by HEART'
[55]:
figure = viz.get_pca_plot(pca, identifier='pca', args=args)
iplot(figure.figure)
[56]:
pca, args = analytics.run_pca(df_wide, drop_cols=['SampleID', 'subject_id', 'WHO max', 'COVID', 'HEART'], group='BMI cat', annotation_cols=['SampleID'])
args['group'] = 'group'
args['hovering_cols'] = ['SampleID']
args['factor'] = 350
args['loadings'] = 20
args['title'] = 'Olink data coloured by BMI cat'
[57]:
figure = viz.get_pca_plot(pca, identifier='pca', args=args)
iplot(figure.figure)

Differential regulation: Covid-19 positive vs negative

[58]:
ancova_results = analytics.run_ancova(df_wide, covariates=['Age cat', 'HEART', 'BMI cat'], drop_cols=['SampleID', 'subject_id', 'WHO max'], subject='subject_id', group='COVID', is_logged=True)
[59]:
volcano_plot = viz.run_volcano(ancova_results, identifier='proteomics_volcanos',
            args={'alpha':0.01, 'fc':2, 'num_annotations': 50,
                  'colorscale':'Blues', 'showscale': False,
                  'marker_size':8, 'x_title':'log2FC', 'y_title':'-log10(pvalue)'})

for plot in volcano_plot:
    iplot(plot.figure)

Functional enrichment

[60]:
go_terms_query = "MATCH (p:Protein)-[]-(bp:Biological_process) WHERE (p.name+'~'+p.id) IN {} RETURN DISTINCT (p.name+'~'+p.id) AS identifier,bp.name AS annotation"
go_terms_query = go_terms_query.format(df_wide.columns.tolist())
annotation = connector.run_query(go_terms_query)
[61]:
annotation.head()
[61]:
annotation identifier
0 mitochondrial genome maintenance AKT3~Q9Y243
1 mitochondrial genome maintenance TYMP~P19971
2 regulation of DNA recombination IL7R~P16871
3 cell wall mannoprotein biosynthetic process MPI~P34949
4 very long-chain fatty acid metabolic process ACAA1~P09110
[62]:
annotation.shape
[62]:
(19230, 2)
[63]:
enrichment_results = analytics.run_up_down_regulation_enrichment(ancova_results, annotation, identifier='identifier', groups=['group1', 'group2'], annotation_col='annotation', reject_col='rejected', group_col='group', method='fisher', correction='fdr_bh', alpha=0.01, lfc_cutoff=1)
[64]:
figures = viz.get_enrichment_plots(enrichment_results, identifier='enrichment', args={'width':1800})
for fig in figures:
    iplot(fig.figure)
[ ]:

[65]:
df_wide = df_wide[df_wide['COVID'] == 'COVID-19 positive']
[66]:
df_wide['WHO 0'] = df_wide['WHO 0'].astype('str')
df_wide['WHO max'] = df_wide['WHO max'].astype('str')
[67]:
df_wide.shape
[67]:
(305, 1428)
[68]:
ancova_results = analytics.run_ancova(df_wide, covariates=['Age cat', 'HEART', 'BMI cat'], drop_cols=['SampleID', 'subject_id', 'COVID', 'WHO 0'], subject='subject_id', group='WHO max', is_logged=True)
[69]:
volcano_plot = viz.run_volcano(ancova_results, identifier='proteomics_volcanos',
            args={'alpha':0.01, 'fc':2, 'num_annotations': 50,
                  'colorscale':'Blues', 'showscale': False,
                  'marker_size':8, 'x_title':'log2FC', 'y_title':'-log10(pvalue)'})

for plot in volcano_plot:
    iplot(plot.figure)
[70]:
go_terms_query = "MATCH (p:Protein)-[]-(bp:Biological_process) WHERE (p.name+'~'+p.id) IN {} RETURN DISTINCT (p.name+'~'+p.id) AS identifier,bp.name AS annotation"
go_terms_query = go_terms_query.format(df_wide.columns.tolist())
annotation = connector.run_query(go_terms_query)
[71]:
annotation.head()
[71]:
annotation identifier
0 mitochondrial genome maintenance AKT3~Q9Y243
1 mitochondrial genome maintenance TYMP~P19971
2 regulation of DNA recombination IL7R~P16871
3 cell wall mannoprotein biosynthetic process MPI~P34949
4 very long-chain fatty acid metabolic process ACAA1~P09110
[72]:
annotation.shape
[72]:
(19230, 2)
[73]:
enrichment_results = analytics.run_up_down_regulation_enrichment(ancova_results, annotation, identifier='identifier', groups=['group1', 'group2'], annotation_col='annotation', reject_col='rejected', group_col='group', method='fisher', correction='fdr_bh', alpha=0.01, lfc_cutoff=1)
c:\users\sande\.conda\envs\pip_rev\lib\site-packages\pandas\core\frame.py:6692: FutureWarning:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.


c:\users\sande\.conda\envs\pip_rev\lib\site-packages\pandas\core\frame.py:6692: FutureWarning:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.


c:\users\sande\.conda\envs\pip_rev\lib\site-packages\pandas\core\frame.py:6692: FutureWarning:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.


c:\users\sande\.conda\envs\pip_rev\lib\site-packages\pandas\core\frame.py:6692: FutureWarning:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.


c:\users\sande\.conda\envs\pip_rev\lib\site-packages\pandas\core\frame.py:6692: FutureWarning:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.


c:\users\sande\.conda\envs\pip_rev\lib\site-packages\pandas\core\frame.py:6692: FutureWarning:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.


[74]:
figures = viz.get_enrichment_plots(enrichment_results, identifier='enrichment', args={'width':1800})
for fig in figures:
    iplot(fig.figure)
[ ]:

[ ]: